Principal Component Analysis (PCA) and its exponential family extensions have three components: observations, latents and parameters of a linear transformation. We consider a generalised setting where the canonical parameters of the exponential family are a nonlinear transformation of the latents. We show explicit relationships between particular neural network architectures and the corresponding statistical models. We find that deep equilibrium models -- a recently introduced class of implicit neural networks -- solve maximum a-posteriori (MAP) estimates for the latents and parameters of the transformation. Our analysis provides a systematic way to relate activation functions, dropout, and layer structure, to statistical assumptions about the observations, thus providing foundational principles for unsupervised DEQs. For hierarchical latents, individual neurons can be interpreted as nodes in a deep graphical model. Our DEQ feature maps are end-to-end differentiable, enabling fine-tuning for downstream tasks.
translated by 谷歌翻译
我们考虑了持续的武装匪徒问题,在汇总反馈下的固定预算范围内推荐最好的武器。这是通过精确奖励不可能或获得昂贵的应用程序的激励,而可提供聚合奖励或反馈,例如子集的平均值。我们假设它们来自高斯进程并提出高斯工艺乐观优化(GPOO)算法来限制一组奖励功能。我们自适应地构造一个树的树,作为臂空间的子集,在那里反馈是节点代表的聚合奖励。我们为建议武器的汇总反馈提出了一个新的简单遗憾概念。我们为所提出的算法提供理论分析,并将单点反馈恢复为特殊情况。我们说明了GPoo并将其与模拟数据的相关算法进行比较。
translated by 谷歌翻译
Purpose: Tracking the 3D motion of the surgical tool and the patient anatomy is a fundamental requirement for computer-assisted skull-base surgery. The estimated motion can be used both for intra-operative guidance and for downstream skill analysis. Recovering such motion solely from surgical videos is desirable, as it is compliant with current clinical workflows and instrumentation. Methods: We present Tracker of Anatomy and Tool (TAToo). TAToo jointly tracks the rigid 3D motion of patient skull and surgical drill from stereo microscopic videos. TAToo estimates motion via an iterative optimization process in an end-to-end differentiable form. For robust tracking performance, TAToo adopts a probabilistic formulation and enforces geometric constraints on the object level. Results: We validate TAToo on both simulation data, where ground truth motion is available, as well as on anthropomorphic phantom data, where optical tracking provides a strong baseline. We report sub-millimeter and millimeter inter-frame tracking accuracy for skull and drill, respectively, with rotation errors below 1{\deg}. We further illustrate how TAToo may be used in a surgical navigation setting. Conclusion: We present TAToo, which simultaneously tracks the surgical tool and the patient anatomy in skull-base surgery. TAToo directly predicts the motion from surgical videos, without the need of any markers. Our results show that the performance of TAToo compares favorably to competing approaches. Future work will include fine-tuning of our depth network to reach a 1 mm clinical accuracy goal desired for surgical applications in the skull base.
translated by 谷歌翻译
We present temporally layered architecture (TLA), a biologically inspired system for temporally adaptive distributed control. TLA layers a fast and a slow controller together to achieve temporal abstraction that allows each layer to focus on a different time-scale. Our design is biologically inspired and draws on the architecture of the human brain which executes actions at different timescales depending on the environment's demands. Such distributed control design is widespread across biological systems because it increases survivability and accuracy in certain and uncertain environments. We demonstrate that TLA can provide many advantages over existing approaches, including persistent exploration, adaptive control, explainable temporal behavior, compute efficiency and distributed control. We present two different algorithms for training TLA: (a) Closed-loop control, where the fast controller is trained over a pre-trained slow controller, allowing better exploration for the fast controller and closed-loop control where the fast controller decides whether to "act-or-not" at each timestep; and (b) Partially open loop control, where the slow controller is trained over a pre-trained fast controller, allowing for open loop-control where the slow controller picks a temporally extended action or defers the next n-actions to the fast controller. We evaluated our method on a suite of continuous control tasks and demonstrate the advantages of TLA over several strong baselines.
translated by 谷歌翻译
The xView2 competition and xBD dataset spurred significant advancements in overhead building damage detection, but the competition's pixel level scoring can lead to reduced solution performance in areas with tight clusters of buildings or uninformative context. We seek to advance automatic building damage assessment for disaster relief by proposing an auxiliary challenge to the original xView2 competition. This new challenge involves a new dataset and metrics indicating solution performance when damage is more local and limited than in xBD. Our challenge measures a network's ability to identify individual buildings and their damage level without excessive reliance on the buildings' surroundings. Methods that succeed on this challenge will provide more fine-grained, precise damage information than original xView2 solutions. The best-performing xView2 networks' performances dropped noticeably in our new limited/local damage detection task. The common causes of failure observed are that (1) building objects and their classifications are not separated well, and (2) when they are, the classification is strongly biased by surrounding buildings and other damage context. Thus, we release our augmented version of the dataset with additional object-level scoring metrics https://gitlab.kitware.com/dennis.melamed/xfbd to test independence and separability of building objects, alongside the pixel-level performance metrics of the original competition. We also experiment with new baseline models which improve independence and separability of building damage predictions. Our results indicate that building damage detection is not a fully-solved problem, and we invite others to use and build on our dataset augmentations and metrics.
translated by 谷歌翻译
Knowledge of the symmetries of reinforcement learning (RL) systems can be used to create compressed and semantically meaningful representations of a low-level state space. We present a method of automatically detecting RL symmetries directly from raw trajectory data without requiring active control of the system. Our method generates candidate symmetries and trains a recurrent neural network (RNN) to discriminate between the original trajectories and the transformed trajectories for each candidate symmetry. The RNN discriminator's accuracy for each candidate reveals how symmetric the system is under that transformation. This information can be used to create high-level representations that are invariant to all symmetries on a dataset level and to communicate properties of the RL behavior to users. We show in experiments on two simulated RL use cases (a pusher robot and a UAV flying in wind) that our method can determine the symmetries underlying both the environment physics and the trained RL policy.
translated by 谷歌翻译
Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems.
translated by 谷歌翻译
Visual Inertial Odometry (VIO) is one of the most established state estimation methods for mobile platforms. However, when visual tracking fails, VIO algorithms quickly diverge due to rapid error accumulation during inertial data integration. This error is typically modeled as a combination of additive Gaussian noise and a slowly changing bias which evolves as a random walk. In this work, we propose to train a neural network to learn the true bias evolution. We implement and compare two common sequential deep learning architectures: LSTMs and Transformers. Our approach follows from recent learning-based inertial estimators, but, instead of learning a motion model, we target IMU bias explicitly, which allows us to generalize to locomotion patterns unseen in training. We show that our proposed method improves state estimation in visually challenging situations across a wide range of motions by quadrupedal robots, walking humans, and drones. Our experiments show an average 15% reduction in drift rate, with much larger reductions when there is total vision failure. Importantly, we also demonstrate that models trained with one locomotion pattern (human walking) can be applied to another (quadruped robot trotting) without retraining.
translated by 谷歌翻译
通过将从地面视图摄像头拍摄到从卫星或飞机上拍摄的架空图像的图像,通过将代理定位在搜索区域内,将代理定位在搜索区域内,将代理定位在搜索区域中。尽管地面图像和架空图像之间的观点差异使得跨视图地理定位具有挑战性,但假设地面代理可以使用全景相机,则取得了重大进展。例如,我们先前的工作(WAG)引入了搜索区域离散化,训练损失和粒子过滤器加权的变化,从而实现了城市规模的全景跨视图地理定位。但是,由于其复杂性和成本,全景相机并未在现有机器人平台中广泛使用。非Panoramic跨视图地理定位更适用于机器人技术,但也更具挑战性。本文介绍了受限的FOV广泛地理定位(Rewag),这是一种跨视图地理定位方法,通过创建姿势吸引的嵌入并提供将粒子姿势纳入暹罗网络,将其概括为与标准的非填充地面摄像机一起使用,以供与标准的非卧型地面摄像机一起使用。 Rewag是一种神经网络和粒子滤波器系统,能够在GPS下的环境中全球定位移动代理,仅具有探测仪和90度FOV摄像机,其本地化精度与使用全景相机实现并提高本地化精度相似的定位精度与基线视觉变压器(VIT)方法相比,100倍。一个视频亮点,该视频亮点在https://youtu.be/u_obqrt8qce上展示了几十公里的测试路径上的收敛。
translated by 谷歌翻译
学习电子健康记录(EHRS)表示是一个杰出但未被发现的研究主题。它受益于各种临床决策支持应用,例如药物结果预测或患者相似性搜索。当前的方法集中在特定于任务的标签监督上,对矢量化的顺序EHR,这不适用于大规模无监督的方案。最近,对比度学习在自我监督的代表性学习问题上显示出巨大的成功。但是,复杂的时间性通常会降低表现。我们提出了图形内核信息,这是EHR图形表示的一种自我监督的图内学习方法,以克服先前的问题。与最新的艺术品不同,我们不会更改图形结构以构建增强视图。取而代之的是,我们使用内核子空间扩展将节点嵌入两个几何不同的流形视图中。整个框架是通过通过常用的对比目标在这两种歧管视图上对比的节点和图形表示训练的。从经验上讲,使用公开可用的基准EHR数据集,我们的方法在超过最先进的临床下游任务上产生了表现。从理论上讲,距离指标的变化自然会在不改变图形结构的情况下创建不同的视图作为数据增强。
translated by 谷歌翻译